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Abstract 

We develop a fast method for optimally designing experiments in the context of statistical 
seismic source inversion. In particular, we efficiently compute the optimal number and 
locations of the receivers or seismographs. The seismic source is modeled by a point moment 
tensor multiplied by a time-dependent function. The parameters include the source location, 
moment tensor components, and start time and frequency in the time function. The forward 
problem is modeled by elastodynamic wave equations. We show that the Hessian of the 
cost functional, which is usually defined as the square of the weighted L2 norm of the 
difference between the experimental data and the simulated data, is proportional to the 
measurement time and the number of receivers. Consequently, the posterior distribution of 
the parameters, in a Bayesian setting, concentrates around the “true” parameters, and we 
can employ Laplace approximation and speed up the estimation of the expected Kullback- 
Leibler divergence (expected information gain), the optimality criterion in the experimental 
design procedure. Since the source parameters span several magnitudes, we use a scaling 
matrix for efficient control of the condition number of the original Hessian matrix. We 
use a second-order accurate finite difference method to compute the Hessian matrix and 
either sparse quadrature or Monte Carlo sampling to carry out numerical integration. We 
demonstrate the efficiency, accuracy, and applicability of our method on a two-dimensional 
seismic source inversion problem. 

Keywords: Bayesian experimental design. Information gain, Laplace approximation, 

Monte Carlo sampling. Seismic source inversion. Sparse quadrature. Uncertainty 
quantihcation 


1. Introduction 

In seismic source inversion, the source parameters can be estimated based on minimizing a 
cost functional, which is usually given by the weighted L2 norm of the difference between the 
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recorded and simulated data. The simulated data are obtained by solving a complex forward 
model, which is described by a set of elastic wave equations. The recorded data are usually 
the time series of ground displacements, velocities and accelerations, recorded by an array of 
receivers (seismographs) on the surface of the ground and in observation wells. On the other 
hand, if we treat the source parameters as random variables, we seek a complete statistical 
description of all parameter values that are consistent with the noisy measured data. This 
can be achieved using a Bayesian approach by formulating the inverse problem as a 
statistical inference problem, incorporating uncertainties in the measurements, the forward 
model, and any prior information about the parameters. The solution of this inverse problem 
is the set of posterior probability densities of the parameters updated from prior probability 
densities using Bayes theorem. Meanwhile, the maximum a posteriori (MAP) estimation is 
obtained by minimizing a cost functional, dehned as the negative logarithm of the posterior. 

Considering the hnancial and logistic costs of collecting real data, it is important to 
design an optimal data acquisition procedure, with the optimal number and locations of 
receivers. In the current work, we assume that there is additive Gaussian measurement 
noise and model the seismic source by a point moment tensor multiplied by a time-dependent 
function. The parameters include the source location, moment tensor components, and start 
time and frequency in the time function. There are in total Ng = 7 parameters in the two- 
dimensional model and = 11 parameters in a three-dimensional model. We then consider 
the problem of optimal experimental design in a Bayesian framework. Under this Bayesian 
setting, a prior probability density function (pdf) of the source parameters is given based on 
expert opinion and/or historical data, and the effect of the measured data is incorporated 
in a likelihood function. A posterior pdf of the parameters is then obtained through Bayes 
theorem by the scaled product of the prior pdf and the likelihood function. To measure the 
amount of information obtained from a proposed experiment, we use the expected Kullback- 
Leibler divergence, also called the expected information gain. It is specihcally dehned as 
the marginalization of the logarithmic ratio between the posterior pdf and prior pdf over 
all possible values of seismic source parameters and the data. The optimal experimental 
setup will then be the one that maximizes the expected information gain. Finding such 
an optimal experiment requires calculating the expected information gains corresponding to 
many possible setups. See Q for more details. 

The common method for estimating the expected information gain is based on sample 
averages, which leads to a double-loop integral estimator [10, 120. This approach can be 


prohibitively expensive when the simulated data are related to solutions of complex partial 
differential equations (PDFs). Hence, in such cases, such as seismic source inversion, more 
efficient approaches are required. 

In this paper, we develop a new technique for efficiently computing the expected in¬ 
formation gain of non-repeatable experiments arising from seismic source inversion. The 


efficiency of the new approach, which is based on our recent work in [12|, results from the 
reduction of the double-loop integration to a single-loop one. This reduction can be accu¬ 
rately performed by Laplace approximation when the posterior distribution of the source 
parameters is concentrated. As the main contribution of the current work, we show that 
the posterior pdf concentrates around “true” source parameters, due to the fact that the 
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Hessian of the cost functional is proportional to the number of receivers and measurement 
time. Consequently, the error of our approximation diminishes by increasing the number of 
receivers and recording time and by improving the precision of our measurements. Hence, 
we extend the methodology in [l^ from repeatable static problems to non-repeatable time- 
dependent problems of natural earthquakes. From a mathematical point of view, we seek 
the concentration of posterior probability distribution conditioned on a time series of data 
instead of repetitive experiments. We also carry out a rescaling of the original parameters 
to address the issue of an ill-conditioned Hessian matrix stemming from the large span of 
the parametric magnitudes in the seismic source term. The integrand of the approximated 
expected information gain is a function of the rescaled posterior covariance matrix, which 
can be obtained by solving N 0 + 2 forward problems, consisting of iVg -|- 1 primal problems 
and 1 dual problem. 

The remainder of this paper is organized in the following way. In Section 2, we formulate 
the experimental design problem for seismic source inversion and briefly introduce the cost 
functional and the expected information gain for a given experimental setup in the Bayesian 
setting. We present the approximated form of the expected information gain based on 
Laplace approximation and derive the rate of errors in Section 3. In the same section, we 
also summarize the hnite difference method for solving the forward problems, the adjoint 
approach for obtaining the Hessian matrix, and the sparse quadratures and Monte Carlo 
sampling for numerical integration. In Section 4, we consider numerical examples for a 
simplihed earthquake and design optimal experiments. Conclusions are presented in Section 
5. 

2. Experimental Design for Seismic Source Inversion 

In this section, we hrst state the seismic source inversion problem. We then dehne the 
expected information gain in a Bayesian experimental design framework for seismic inversion. 

2.1. Deterministic full waveform seismic source inversion 

We hrst consider the full waveform seismic source inversion problem in a deterministic 
setting, which is stated as a PDE-constrained optimization problem. The PDEs are given 
by elastodynamic wave equations in a compact spatial domain, D C d = 2,3, with 
a smooth boundary, dD. We augment the PDE with homogeneous initial conditions and 
different types of boundary conditions. The initial-boundary value problem (IBVP) reads: 


i/(x) Utt{t, x) - V ■ cr(u(f, x)) = f(f, x; 6) 

in [0, T] X D, 

(la) 

p 

p 

0 

on {t = 0} X D, 

(lb) 

cr{u{t, x)) -£1 = 0 

on [0, T] X dDo, 

(Ic) 

ut{t, x) = B(x.) cr{u(t, x)) ■ h 

on [0, T] X dDi. 

(Id) 

The IBVP solution, u = (mi, ..., Ud)~'^ , is the displacement held, with t and x = (xi,., 

■ •Pd)^ 


the time and location, respectively, and cr the stress tensor, which in the case of elastic 
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isotropic materials reads 


(t(u) = A(x) V ■ u J + /i(x) (Vu + (Vu)"^), 


( 2 ) 


where / is the identity matrix. The material properties are characterized by the density, 
and the Lame parameters, A and p. Two types of boundary conditions are imposed on the 
boundary, dD\ a homogeneous Neumann (stress-free) boundary condition (ITc)) on ODq, and 
an absorbing boundary condition flldj) on dDi = dD \ ODq, where h is the outward unit 
normal to the boundary, and B is a given matrix, see for instance jl, [l^ . We visualize such 
a compact domain together with stress-free and absorbing boundary conditions in Figure 
[U The system ([1]) admits longitudinal (P or pressure) and transverse (S or shear) waves, 
which, in the case of constant density, propagate at phase velocities 

Cp = v^(2/i -h A)/z/, Cs = 


respectively. There can also be surface waves traveling along a free surface, as well as waves 
that travel along internal material discontinuities. 

The function f represents the seismic source. We consider the case of a point moment 
tensor source, 

f(f, x; 0) = S'(t) M V5(x — Xg), (3) 

located at x^ = (xi^,..., x^s)^ G iA, where V(5 is the gradient of the Dirac distribution. The 
source time function, S(t) = S'(t; w^), depends on two parameters: a time shift, tg, and a 

frequency parameter, Ug. The moment tensor, M, is a constant symmetric matrix. 


M = 


m 


XlXl 




V 


m 


xixa 


m 


t)dxd 


3:d^d 




The source parameter vector, 6 G consists of Ng = dim(0) = + ^d +2 parameters. 


6 = (xi5,. 


• ) ^dsj tg, UJg, TTl 


X ^ XlI • • 


■ ; f^XdXd 


T 


There are Ng = 7 and W = H parameters, when d = 2 and d = 3, respectively. 

In the seismic source inversion problem, given > 2 recorded waveforms, the goal 
is to hnd the source parameter vector, 6 . This is achieved, for instance, by minimizing a 
full waveform cost functional, which is given by the difference between the recorded and 
simulated waveforms. 


x{e) 



dr(f) 1^ dt. 


( 4 ) 


Here, u(f,Xr) and dr(f), with r = are the simulated and recorded waveforms 

at the r-th recording station, respectively, and |v| denotes the magnitude of the vector, 
V G In practice, the data are recorded at W discrete time levels tm G [0,T], where 
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0 = to < < • • • < ^7Vt-2 < ^Nt-i = T. Moreover, the problem ([I]) cannot be solved 

analytically and needs to be discretized. The discrete cost functional corresponding to (j4]) 
may therefore be written as 


Nt — l 


= o E E K - 


(5) 


r=l m=0 


where u™ u(tm,Xr) is a hnite difference approximate solution to ([1]), assuming that all 
recording stations coincide with grid points, i.e., = Xi^ for some d-dimensional index 

vector, h = (*ir, • • • ,idr)- See Section 13.3.11 for more details on the hnite difference approxi¬ 
mation of ([I]). 

It is to be noted that the inverse problem based on the minimization of ([5]) is generally 
an ill-posed problem when the dimension of the parameter vector Nq is large. This means 
that inhnitely many parameter vectors 6 match the recorded data. In such cases, in order 
to obtain a well-posed problem, the cost functional ([5]) is often augmented by additional 
regularizing terms. See for instance 2^ for more details. Here, we focus on the case where 
Nq is small compared to the number of measurements and the problem is well-posed. 

The above inversion techniques are deterministic and do not take into account the un¬ 
certainty in the measurements. Therefore, in the next section, we consider a Bayesian 
framework that accounts for the uncertainty in the problem. 


2.2. Bayesian inference and experimental design for seismic source inversion 

Here, we consider the inversion and the experimental design problems in a Bayesian 
framework by including uncertainty in the form of additive noise in the measurements. 

Let ^ G be a given experimental setup, which is a vector of design parameters. For 
instance, it may consist of the number, N^, and the location, {x,.}^]^, of the seismographs, 
recording the wave forms, Moreover, let 

g, = g,{t, e, : [0, T] x x ^ R^ (6) 

be a forward model for computing the vector of outputs at the r-th recording station, given 
a source parameter vector, 0, and an experimental setup, We further assume that there 
is additive Gaussian measurement noise and collect observation vectors, 

+ G, with r = l,...,iV^, (7) 

using the same experimental set-up, Here, 6* is the A'^^-dimensional vector of ’’true” 
parameters used to generate synthetic data, and is assumed to be additive independent 
and identically distributed (i.i.d.) Gaussian noise, ~ A/'(0, Ce), corresponding to the r- 
th observation. Hence, the deterministic output vector, gr{t,d*,^), is the mean of the 
observation vector, Vrit). Moreover, the collection of Nr observed data points, 
is i.i.d., given specihc values of t, 0*, and Note that, in practice, the observations are 
recorded at W discrete time levels, {tm}m=o- 
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In seismic inversion, the ontpnts are usnally the waveforms at the recording stations 
obtained by discretizing the forward problem We therefore consider the following 

parameter-to-observable map for the forward model (|6i) : 

gr = u(fm,x,,; 0,^), with r = 1 ,...,Nr, m = 0,...,W-l, (8) 


where, we have abused the notation to emphasize the indirect dependence of u(tm,Xr) on 6 
and 

The Bayes theorem gives the posterior pdf of the parameter vector as 




p{{yr}W^^)p&{^) 

PiiVrW 


We note the fact that p{{yr}\^) is a scaling factor, which does not depend on 6. The 
posterior of the parameter vector is proportional to the product of the likelihood function 
and the prior pdf. 


Nr Nt-1 

P©(^|{yr}l^) OC exp(^-- ^ (9) 

r=l m=0 

Here, the residual for the r-th measurement at the m-th time level reads 

^r{tmi ^1 Y) ■ yri^rni Qr^rni yrij'rn^ ^ Qrij^rni T Gr ) (i-0) 

where 6 is the generic unknown parameter to evaluate the posterior. 

We then dehne a cost functional as the negative logarithm of the posterior, 

.. Nr Nt — l 

C(e) := -log(pe(e|{!,,},5)) = E - h(e)+C, (11) 

r=l m=0 

where h{6) = log(p©(0)), and C is a constant. 

Maximizing the likelihood amounts to minimizing the cost functional fITT]) . We also 
note the direct relation between the cost functional (ITT]) and the cost functional ([5]) in the 
deterministic setting, where only the hrst term in (ITT]) is retained and (7^ is an identity 
matrix. The second term in (ITT]) is related to the regularizing terms, which can be added to 
(I5|) to obtain a well-posed deterministic problem. Moreover, we derive our approximation 
of the expected information gain for a hxed value of which is not treated as a variable in 
the rest of the paper. We also do not write it as a condition in a probability distribution for 
the sake of conciseness. 


2.3. Expected information gain 

The Kullback-Leibler divergence is a non-symmetrical measure of the distance between 
two probability distributions ll|. It is related to many other statistical invariants. For 
example, if we treat the posterior pdf as a small perturbation of the prior pdf, the Hessian 
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of the Kullback-Leibler divergence is the Fisher information matrix. The Kullback-Leibler 
divergence measures how much information the data carries about the parameter. When 
designing an experiment in the context of parameter inference, it is important to maximize 
the expected information gain. 

The Kullback-Leibler (K-L) divergence for a given experiment reads 

Ckl(!/J:= / log P«<^l^-» pe(9|{y.})rfg. (12) 

J& 

where p&{0) and p&{0\{yr}) are respectively the prior and posterior pdfs of the unknown 
random parameter, 6. 

The corresponding expected K-L divergence, which represents the expected information 
gain in the unknown parameters, 0, is then given by: 

/:=Ey|DKLl= [ [ (13) 

JyJ& 

As mentioned in the introduction, using the direct sample average to estimate the expected 
information gain leads to a double-loop summation. In the next section, we present a fast 
technique for computing f[T3|) based on Laplace approximations, motivated by [H, . 

3. Fast Estimation of Expected Information Gain 

In [l^, it is shown that if we are able to carry out M repetitive experiments and if M 
is large, the expected information gain can be estimated by Laplace approximation with 
a diminishing error asymptotically proportional to M~^. In seismic source inversion, the 
large-M assumption is not fulhlled. Without losing generality, we show in this section that 
the error of the Laplace approximation also decreases when the number of receivers and 
the measurement time increase. Therefore, we can obtain a fast estimator of the expected 
information gain in the case of non-repeatable experiments. 

We consider the cost functional, C{0), in flTT]) and let 6 be its minimizer, 

0 = argmin£(d). (14) 

0 

We further make the following precise assumptions: 

• Assumption Al. The smallest singular value of the Jacobian of the output model, 
gj.(tm,0), with respect to 6 is bounded, uniformly in 0 and f, from below and away 
from zero by a constant. 

• Assumption A2. The output model, gr{tm, 0), satishes g^. G \/tm & R+- 

The above two assumptions are used to estimate the magnitudes of quantities, e.g., fl32|) . 
and rates of errors in the approximations, presented in Theorems 1-4 and in the appendix. 

We now collect the main results of the paper in Section ITTl followed by the proofs in Sec¬ 
tion 13.21 We then present a fast numerical method for computing the expected information 
gain in Section 13.31 
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3.1. Main results 

In this section, we state the main resnlts. Here, we explain two notations: first, for 
two real vectors a = ( 01 , 02 )"'" and b = ( 61 , 62 )"'", we have Ve V^a o b = 

Second, Op denotes big-0 in probability, e.g., we write En = Op if and only if 

for a given e > 0, there exist a constant No and an integer K, such that for all N > Nq, 
PHenI > KN-^/^) < e. 

Theorem 1 . Under assumptions A1 and A2, the minimizer flT^ of the cost functional flTT]) 


is given by 

e = 0* + Op{N-^/A^ 

(15) 

or by 



b = e* + H{o*) 

Nr Nt-1 

C:~^^ eg Atm, 0*) +'^eh{0*)) +Op{N-A, 

r=l m=0 

(16) 


where the total number of measurements is 


N = NRxNt, 


and the Hessian of with respect to 0 is 

Nr Nt-1 

H{0*) = E E {^egr{tm.e*y C:^Negr{tm.e*)-NeNegr{tm.e*)oC:^ e^YNeVehiO*). 

r=l m=0 

Theorem 2. (Gaussian approximation of the posterior) The posterior pdf in ([H]) can be 
approximated by a Gaussian pdf as follows: 


p&{d\{yr}) 


exp 


l{e-eyH{ 0 ) ( 0 - 0 )) 

-^- — exp 

(2 7r)^«/2|/f(0)|V2 



=pe(e|{y,}) + Op(|e-et). 




(17) 

(18) 


where 

H{e) = H,{d) + H2{e)-VeV0h{e), ( 19 ) 

with 

Nr Nt-1 

HiiO) = E E "^ogAtm, 0V C:^ ^egritm, O) = 0{N), (20) 

r=l m=0 
Nr Nt-1 

H2{e) = -J2Y1 ^eVegAtm. 0) O C:^ {gAtm. 0*) - gAtm, 0)) 

r=l m=0 
Nr Nt-l 

-HH ^e^egAtm, 0) o er = Op{N^/A- 

r=l m=0 
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( 21 ) 





Theorem 3. Under assumptions Al and A2, the expected information gain is given by 


-iiog((2.)-. |H(e)-|) - ^ - hw - HHieUvvHe)) - 

+ O (N-^) , (22) 

where H{0) is given by (USD-dai). 

Theorem 4. Under assumptions Al and A2, the expected information gain is given by 

1= [ DKLiO*)p{9*)de*+ 0{N-^) , (23) 

J& 

with 

Dkl = -i|og((27r)''«|/fi(r)-‘|) -tf- h(e>). (24) 

Here, Hi{6*) is given by fl2U]) with 6 replaced by 6*, which is the “true” parameter generating 
the synthetic data, cf. o 

Remark 1. The expected information gain, I, in fll3p can be approximated by the integral 
term in fl2^ with an asymptotic error proportional to N~‘^. The expected information gain 
can also be approximated by the integral term in fl2^ with an asymptotic error proportional 
to N~^. The dimension of the integration domain in Theorem|l]is less than that in Theorem 

El 



3.2. Proof of the main results 

In this section, we collect the proofs of the main results. 

Proof of Theorem [H We hrst hnd Z{6) so that 

0 = argmin £(0) = argmin ^(0). (25) 

0 0 

By (ITOD and (ITT]) . we have 

. Nji Nt — 1 

m = 2(9) + - 5; 5; cr‘ £, + C, (26) 

r=l m=0 

where 

.. Nji Nt — 1 

E (9Atr,.,e-) - gAira.ep cp (gAt„.,e-) - gAtra,e))+ 

r=l m=0 

Nr Nt-1 

E E (9,(*™.e*)-9.((m,9))’'c'r'cr-M9)- 

r=l m=0 
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Note that the last two terms in fl26|) are independent of 6. We then have 


Nr Nt-1 

VeZ{e) = idritm, 01 - 9r{tm, ^ 7 ^ Ve9r{tm, 0) " 

r=l m=0 

Nr Nt-l 

^ C:^Ve9r{tm.0) -Veh{0), 

r=l m=0 

where 

Nr Nt-l 

VeVeZ{0) = -Y, Y. ^eV09r{tnt,0)oC:^ 0 *) - 0 )) + 

r=l m=0 
Nr Nt-l 

Y Y ^e9r{tm,0V C:^Ve9r{tm,0)- 

r=l m=0 
Nr Nt-l 

Y Y ^e^e9rl'm,0) -Ve^eh{0). 

r=l m=0 

The Taylor expansion of 'WeZ[9) in flTTIl around 0* reads 

VgZ{0) = VeZ{0l + {0- 6 >*)^ VeVeZ{01 + O{\0 - 0*\1. 

We now evaluate fl2^ at 0 = 0, and noting that VgZ{0) = 0 by fl25l) . we write 
0-0* = -{V0V9Z{0*))~^ V0Z(0*)~^ + O(\0 - 0*\1. 

From (127|) . we obtain 

Nr Nt-l 

V0Z{01 = -YY^^^ ^e9ritm, 01 - V0h{01 = Op(iVV2), 

r=l m=0 

where the order, Op(7V^/^), follows Appendix A. 

Moreover, from fl28|) . we obtain 

Nr Nt-l 

V0V0Z{01 = Y Y '^e9r{tm.0N C:^'^09r{ttn.01- 

r=l m=0 
Nr Nt-l 

Y Y ^e^e9ritm,01oC:^er-V0V0hi01 

r=l m=0 

= 0{N) + Op{N^/l = Op{N), 

where the orders Op{N^^l and 0{N) of the second and first terms in the right-hand 
follow Appendices B and C, respectively. The proof is completed by fl30|) - fl32|) . 
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Proof of Theorem [2l Let C{6) be the second-order Taylor expansion of C{6) around 0, 

c{e) = c{e) + VeC{e) ( 0 - 0 ) +1 ( 6 >- e)^VeVeC{e) (e-e). 

The first term in the right hand side is independent of 6 . Moreover, by fl25|l . the second 
term is zero, since \/ 0 C{ 6 ) = 0. We are therefore left only with the third term. Similar to 
the proof of Theorem [H flT^ follows easily. Furthermore, the growth order in (120]) follows 
in a similar way to Appendix C. It is left to show flTI]) . By flTU]) . and similar to the proof of 
Theorem [H we write 

Nr Nt-l 

H 2 {e) ^e^egritm, 0 ) o C:^ e,- 

r=l m=0 
Nr Nt-l 

0) O C-^ 0*) “ OAtm, 0)) ■ (33) 

r=l m=0 

The first term in the right-hand side of fl33l) is of order Op{N^^A-i similar to Appendix B. 
For the second term in the right-hand side of fl5^ . we use Taylor expansion and write 

gAtm. 0) = gAtm, 0*) + VegAtm. 0*) {0-0*) + O{\0- r n. 

Then at 0 = 0, using f[T5|) . we have 

gAtm,0) - gAtm,0*) = Op{n-^/A- 

The hrst term in the right-hand side of fl33p dominates, and this completes the proof. □ 
Proof of Theorem [3l We first rewrite the information gain flT^ as 
Dkl= [ iog/' ps(e|to.}) W^(g|{ })^g+ 

J& \ P@[0) ) 

where p is the Gaussian approximation of the posterior, p, given in Theorem [2] by flTT)) . 


Then, we can write 

Dkl = Di + D 2 + + D 4 , (34) 

where 

T>i := [ log {p&{0\{yA)) P&{0\{yr})d0, (35) 

J& 

T >2 := - log {ps{0)) p&{0\{yA)d0 = - h{0)p&{0\{yA)d0, (36) 
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The first term fl35l) reads 

D. = -ilog((2ir)"*|C|)-^, C-.= H(e)-\ (39) 

For the second term fISBD . we first Taylor expand h{0) = log(p©(0)) about 0 to obtain 

m = ^ - er + Op{\e - e\% 

l«l<4 

where ol G is a multi-index with the following properties: 

Ng Ng Ng 

iQi=xi“» 

2=1 2=1 2=1 

The odd central moments of the multivariate Gaussian, p, in f[T7|) are zero, and the parameter 
posterior covariance, C, is of order Op{N~^), due to Theorem[2l Moreover, the fourth central 
moment of this multivariate Gaussian is a quadratic form of the second moment, and hence 
is of order Op{N~‘^). Gonsequently, we have 


Do = 




Z - »)“ + Op(\e - 0 ?) pmyMM 


|q;|<4 


ol\ 




(40) 


Here, A : B = Yhi j is the component-wise inner product of two matrices, A = (Aij) 

and B = {Bij) of the same size. 

Next, we consider the third term fl37|) . Since the approximate posterior, p, is a second- 
order Taylor approximation of the log posterior, p, we have 


where hp = 
show that 


log 


\p^{0\{yr})) 


D^hp{e) 

ck! 


{e-e)^ + Op{\e 



log(p©(0|{^^})). Similar to the analysis of the second term, 1^2, we can easily 


^3 = Op{N-^). 


(41) 


Finally, for the fourth term fl55D . we have 


D, 



( pM{yr}) \ 

V psio) ) 

Y,^^^{o~er + Op{\o 




^P&{6\{yr])d0. 
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After the first-order Taylor expansion of the exponential term, we obtain 


DA 



f Pe(^|{?/r}) \ f 

V PeW 


D^KiO) 

a! 


(6>-0)“ + Op(|6> 




Since log is asymptotically Op (log(iV)), and the third moment of a multivariate 

Gaussian is zero, we have 


f log(^^^A^A^)op(\0-0f)Pe(e\{y,})d0 = Op(N-Hog(N)). (42) 

J& V P&i^) ) 

This is the fourth moment of the Gaussian posterior, p, which has already been shown to 
be inversely proportional to iV^. 

Substituting fl3^ - fH2|l into ([34]), we obtain 


Dkl = -ilog((2,r)"«|C|) - ^ + h(0) + ^ + Op {N-Hog{N)) . (43) 

After marginalization over data, and conditioning on the “true” parameter, which has a pdf 
identical to the prior of the unknown parameters, the approximation and error estimation 
for the expected information gain fl2^ is obtained. This completes the proof. □ 

Proof of Theorem [H Using Woodbury’s formula [sj to invert flT^ . we have 




= H^{e) + H2{e)-VeVeh{e) 

=Hi{6Y^ + Hi{e)-^L{K-^ + RHi{0)-^L)-^RHi{e)-^ 


(44) 


with 112 ( 6 ) — VeVeYO) = LAR the corresponding eigenvalue decomposition. The second 
term on the right-hand side of fl4T)) is of order Op{N~Y- Hence, 

H(d)-^ = Hi(d)-^ + Op{N-i). 

We therefore can approximate the information gain, Dkl, in by 

Dkl = Dkl + Op{N~^), 


where 


Dkl ■■= -ilog((27r)'^-|Hi(e)-‘|) -^ + h(0) + H,{0)-LVVh(0) 
Next, we Taylor expand Dkl about 6*, 


Dkl{9) = Dkl{6*) + VDKL{ey {6 - 6*) + Op{\G - 9 


1 * | 2 \ 


(45) 
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Now, by rearranging the expected information gain, we get 


1= [ DKLmp{e*)de* + [ [ VDKL{0*f{0 - e*)de*p{{er.})d{er} + o{N-^). (46) 

J& J& J{e,} 

By Theorem [1], as iV —)■ oo, the difference, 6 — 6 *, is asymptotically proportional to 
(see (US])); in other words, Y.^=i Y.m=o C~~'^'\/egr{tm,0*) = Op[N^/‘^) is dominant 
over V 0 h{O*) = O (1), and H( 6 ) is dominated by its deterministic component, Hi(0) (see 
Theorem [2|). Since is a centered Gaussian random vector, the second integral on the 
right-hand side of fl46l) vanishes. Therefore, the approximated expected information gain as 
shown in flT6|l has an error of order and the proof is complete. □ 

3.3. Fast numerical approach 

3.3.1. Finite difference approximation of the problem 

Consider a rectangular spatial domain, D = [—Li,Li] x J— ^2,0], in We employ a 
second-order accurate hnite difference scheme, proposed in [l6|], for solving ([T]). Let h = 
2 Li/(iVi — 1) = L 2 /{N 2 — 1) > 0 denotes the spatial grid-length, where Ni and W are 
the numbers of grid points in the Xi and X 2 directions, respectively. Let i = {i,j), and for 
i = 1 ,..., iVi and j = 1 ,... ,N 2 , consider the computational grid 

Xi = {xii, X 2 j) = {-Li -h (i - 1) h, -L 2 + {j -l)h). 

Denote by Ui(t) = (unit), U 2 i(t))~^ the semi-discrete approximation of u(f, Xi). For interior 
grid points and grid points on the free surface boundary {dDo = {x : xi G [—Li,Li],X 2 = 
0}), we have 

i^(xi)ui(f) = A(ui(f))-Ff(f,Xi;0), XiGD\(9T>i, (47) 

where f (f, xi; 6 ) is a discretization of the singular source term, f(t, x; 6 ), and Ah is a difference 
operator containing standard operators D_, D+, Dq in both spatial directions and A, /i, and 
h, see [1^. On the boundary, dDi, we discretize the boundary condition (lTd|) . 

Ui(t) = Bhiui{t)), Xi G dDi, (48) 

where Bh is a difference operator containing D_, D+, u, A, p, and h. 

We further collect the semi-discrete solution, Ui(t), at all W = Ni x N 2 grid points 
in two 7V/i-vectors, Uiif) and U 2 {t). We also collect f(t, Xi;0) at all Nh grid points in two 
W-vectors, fi{t] 6 ) and /2(t;0). We hnally combine the semi-discrete formulas, flT7|) and 
fl48|l . and write them in matrix form: 

Ji Uiit) + I 2 Ui{t) = Ai Uiif) + A 2 U2{t) + C fi{t] 6), 

hU2{t) + l2U2{t) = BiUi{t) + B2U2{f) + C f2{t-, 0). 

Here, all matrices, Ai, A 2 , Hi, B 2 , G, Ji, I 2 , are in and Ji -|- /2 = /. 

The full discretization is obtained by discretizing the time domain [0,T] into W equidis¬ 
tant time levels, tm G [0,T], where 0 = to < H < • • • < tNt -2 < twt-i = T with a time step 
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At = T/{Nt — 1). We let and denote the full-discrete vectors, 'Ui(tm) and U 2 (tm), 
respectively, and use the following central difference formulas: 


~m+l _ 9 ~m I f.m-l 
-m Uj^ — A ~\- (JL^ 

'^k = 


At" ’ ^ 

We dually arrive at the full-discrete formulas 


Jim 

Uu = 


ar' - 


m—1 


2 At 


k = l,2. 


•'771+1 


f^lFD = I 


2 hf + h’l' 


m—1 


+ h 


u 


m+1 

1 


U 


~m—l 

1 


At" 

-+<-+h^-c'/r = o. 


2 At 


f^2FD = h 


-m+l 9 ~m _i_ 1 ~.m-l 

in ^ tin “F tin 


+ h 


Ur, 


Un 


At" 

- Si < - ^2 h™ - C /™ = 0, 


2 At 


(49) 


(50a) 


(50b) 


where = fk{tni] 0 ) with /c = 1, 2. 


3.3.2. Computation of the Hessian of cost functional 

In this section, we describe in detail how to compute the Hessian, VeVeS, of the cost 
functional, £, given in flTT]) . We consider the drst term in the right-hand side of fllll) and 
write 

Nt-l ^ Nr 

C,{e)= L(t^,<(+,<(+), ]L-.= -J2Mtm,0VC:^rr{t^,0), (51) 

777=0 7'=1 

where the residual r^, given in ffTOj) . is a function of ui and U 2 , which are in turn functions of 
tm and 0. The Hessian of the remaining terms in the right-hand side of flTT]) . i.e. —h{0) + C, 
is simply —VeVeh. 

To obtain the Hessian of £i, we drst introduce the Lagrangian: 

Nt-l 

ti= 5^(L(t™,<(0),d™(0))-t-(/pr^SiFD + </^^^S2FD). (52) 

m=0 

Since by fl50|) . Hifd = 0 and f/ 2 FD = 0, we may choose the Lagrange multipliers and 
freely. Consequently, we have = £i, and hence Veti = and VeVeti = 

In order to avoid long expressions, we set vI^q := Vevl^ and := Veflf, with /c = 1,2. 

Thanks to ([50]), we have 


Nt-l 


•'777+1 


. - _ 9 ~777 _i_ ~7?7—1 

V7 r \ ^ ( V7 IT JT.m I V7 TT ,r.m I ,^777+ f J ^10 ^ ^16 ' ^10 

Vel-i — 2_^ I -|- \/umLu20 -|- (7i- 


777=0 

~m+l _ ~m-l 
r “le 

^2 ■ 


+ h 


2 At 

, i‘T3 - 


u 


~m—l 

20 


2 At 


Hi uTe - A2 u^e - C fTe) + ^ 2 ^ {h 
B,hre-B2h^e-CfTe))- 


At" 


‘‘2 0 


‘‘2 0 


AH 
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Using summation by parts, it is easy to show that the following relations hold for k = 1,2, 


Nt-l 


E 


u 


m+l 

kO 


2 


m=0 




E 

m ,=0 


^k 


+ ^k 

Af 


huTe, 


and 


Nt-l 




~m+l _ 

'^ke '^kO 


Therefore, 


m=0 


Nt-l 


2 At 


1 ,_7n+l”^ ._m— 


m=0 


2 At 


2 «fc0- 


V0£i= - h 


m=0 
iT A ,^mT 


At^ 


2 At 


m+lT _ mT , 

,.mT A D \ \ fY7 ¥i^ 2 r2 ^r2 r 

V^i ~ ^^2 -Di j ^le + H-^-;-5- Ii 


At^ 


^2 -^2 
2 At 


I 2 ‘Px A 2 <^2 ^2)^26 Vl ^Jie ^2 ^121 


Hence, if we let yj™ and (p^ be the solutions to the following dual problems. 


$1FD = 


At^ 

^mT A ,«mT 


1"*” 

T ^1 -^1 T 

-'1- TTIT, --'2 


2 At 


$2FD = 


- ' Hi - ' Hi + VnpL = 0, 

v>r'^-2»>?^+v>r'^ 


At^ 

iT /t ,«mT 


r ^^2 - ¥^2 r 

-'1- TTIT, --'2 


2 At 


H2 - ' H2 + = 0, 


then 


M-i 


m=0 


v»£, = - hr‘"c/r,+ 


(53a) 


(53b) 


In a similar way, we can differentiate twice the Lagrangian with respect to 6 and use fl5U]) 
and fl5^ and summation by parts formulas to obtain 


V0V0A1 — Hi + Hu, 


where 


A^t-l 

Hi=Y. («.)^(V,5^V,5.L)«,) + «,)^(V,^V,J.L)«,)), 

m=0 

Nt-l Nh 

ff;; = - E Y.{^^AC)NefTe{i, 0 + 

m=0 i=l 


Q(",: 


(54) 

(55) 
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Here, {y)i G M means the i-th entry of a vector, y G and feii, ■) G means the 

i-th row of a matrix, fe G . Note that, in this case, we will have Vefei^i, :) G 

We hnally obtain the Hessian of the cost functional, 

V 0 V 0 /I =-H/+ if//— VeVgh, (56) 

where Hj and Hu are given by fl5^ - fl55|) . The computation of the second term of the 
Hessian, Hjj, requires solving one dual problem fl5^ and one full elastic wave equation fl5U]) 
to obtain and consequently V^i^L, with k = 1,2. However, calculating the first term. 
Hi, requires the quantities u^q and u^q, which satisfy the elastic wave equation with the 
force term fg. Therefore, in total, Ng + 2 wave equations must be solved to compute the 
Hessian (156|) . 

Remark 2. We note that in practice, we only compute the hrst part of the Hessian Hj 
(see Theorem H]) for which we need to solve Ng primal problems. If higher accuracy and 
consequently the computation of the second part of the Hessian Hu is needed (see Theorem 
13]), we also need to solve one primal and one dual problem (153]) . 


In order to further clarify the calculation of the Hessian, we address the following two 
issues: 

1. Calculation of and with k = 1,2. By the definition of L in fl5T]) . 

we have 

Nr 

V ^ ^ Tf {tiYi, 0 ) C*g V {imi ) 

r=l 

Note that by (]3]) and (ITU]) , we have 




fo ... 0 -1 0 ... 0\ 

(^0 ... oj’ 





-10.. 


0" 

o' ’ 


which are 2 x W matrices with zero elements except one element being —1 corresponding to 
the receiver, r. Similarly, we obtain G with zero elements except for Nn 

diagonal elements being cn at the locations corresponding to Nn receivers. We note that 
for computing V^^L, we need but is independent of h™. 

2. Discretization of the singular source function. We need to discretize the source 
function, f, in (J3]) so that it is twice continuously differentiable with respect to 6 (see 
(155]) b In other words, the gradient of the Dirac distribution, V(5(x — x^), needs to be 
discretized so that it is twice continuously differentiable in x^. For this purpose, we employ 
the technique proposed in to derive regularized approximations of the Dirac distribution 
and its gradient, which result in point-wise convergence of the solution away from the sources. 
The derivation of approximations of the Dirac distribution and its gradient is based on the 
following properties: 


J 0(x) (5(x - x^) dx = (p{xs), 


j 0 (x) cla;d(x - xj dx = -cl 2 , 0 (xj. 
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which hold for any smooth, compactly supported function, 0. In one dimension with a 
uniform grid, with grid size h, the integrals are replaced by a discrete scalar product, 
{p,q)i,h ■= h'^piqi. Fourth-order accurate approximations of the Dirac distribution and 
its gradient are for instance obtained when the integral conditions are satished with 0 being 
polynomials of degree four. Let < Xg < and a = {xg — Xk)/h. Then a fourth-order 
discretization of (5(x — x^), which is twice continuously differentiable in x^, is given by j^: 

4_2 = (a/12 - aV24 - aVl2 - 19aV24 + P{a))/h, 

5k-i = (-2a/3 + 2aV3 + a^/6 + 4a^ - 5P(a))/h, 

4 = (1 - 5aV4 - 97aVl2 + 10P(a))/h, 

4+1 = (2a/3 + 2aV3 - a^/6 + 49aV6 - 10P(a))/h, 

4+2 = (-a/12 - a^/24 + aVl2 - 33aV8 + 5P(a))/h, 

4+3 = (Sa'^/e - P(a))/h, 

= 0, j ^ {k — 1, k, k + 1, k + 2}, 

where P(a) = 5a®/3 — 7a®/24 — 17a^/12 -|- 9a®/8 — a®/4. Similarly, a fourth-order dis¬ 
cretization of d'(x — Xg) is given by 

4_2 = (-1/12 + a/12 + aV4 + 2a^/3 + R{a))/h^, 

4_^ = (2/3 - 4a/3 - aV2 - 7a^/2 - 5P(a))/h^ 

4 = (5 a/2 + 22 a^/3 + 10P(a))/h^ 

4^^ = (-2/3 - 4 a/3 + aV2 - 23 a^/3 - 10 P(a))/h^ 

4+2 = (1/12 + a/12 - a^/4 + 4 a^ + 5 R{a))/h‘^, 

Sk+3 = (-5aV6-i?(a))//i^ 

Sj = 0, j ^ {k - l,k,k + l,k + 2}, 

where R{a) = —25a^/12 — 3a®/4 -|- 59a®/12 — 4a^ -|- a®. A two-dimensional approximation 
can for instance be obtained by Cartesian products of one-dimensional discretizations: 

l(x - X.) =» - 12.). V^(X - X.) « ) ■ 


This representation of the forcing together with the second-order accurate finite difference 
scheme, presented in Section 13.3.11 result in an overall second-order convergence of the 


solution away from the singularity at (see also 14|). 


The complete algorithm for computing the Hessian is outlined in Algorithm [T] 


3.3.3. The scaled Hessian 

In seismic source inversion, the parameters span several magnitudes. For example, the 
time shift and frequency are O (1) and the source momenta are O (10^^), which potentially 
lead to a Hessian matrix with a large condition number. To deal with this problem, we carry 
out a change of variables as follows: 

0 = SO with S = \/diag{Hi). 
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Algorithm 1 Calculate the Hessian of the cost functional given a 6* 

Calculate Hi: 

1. Discretize the source function f and obtain fk{t] 6*) for k = 1,2. 

2. Calculate de^fkit] 6 *) for j = 1,..., by differentiating /^(t; 6 *) in step 1. 

3. Solve flSU]) with forces in step 2 and obtain for k = 1,2. 

4. Find for k = 1,2. 

5. Compute Hi by fl54D . 

Calculate Hu: 

6. Solve flsni) with forces in step 1 and obtain for k = 1,2. 

7. Find for k = 1,2. 

8. Solve fl53l) and obtain for k = 1,2. 

9. Calculate Vef'ke by differentiating d0.fk{t;0*) in step 2. 

10. Compute Hu by fISSD . 


Consequently, the rescaled Hessian reads: 

H = S-^HiS-\ 

and the information gain in the new scaled variables is: 

= - iiog((2^)"»|H-(e)-'|) - ^ + h{e) + ^ 

(57) 

where p^{9\{y^})= Pq(0) = p&{S~^e)\S\~\ and h(0) = logp^iO). 

We approximate 0 by 6*, such that Theorem 0] can be applied to compute the expected 
information gain. 


3.3.4- Numerical integration 

In this section, we briefly review two approaches for numerical integration, namely the 
deterministic sparse quadrature and Monte Carlo random sampling. Note that we use the 
sparse quadrature under the assumption that the corresponding integrand has a certain 
level of regularity. We should resort to sampling based numerical integration techniques, 
e.g., Monte Carlo sampling, when this assumption is not valid. Due to the singular source 
term and its twice continuously differentiable discretization, the solution of our problem 
does not possess high regularity with respect to the source location parameters. 

For details of sparse quadratures, see [3, [ 2 I, 0, [l3, l22|- Also, see [13, EH for a detailed 
regularity analysis and convergence study of the sparse quadrature for the stochastic wave 
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equation. We can use the interpolating polynomial-based numerical integration and write 


I 


r 

/ DKL{e*)p{e*)de* = y^w,DKL{e*) + e ,, 


(58) 


where and {wiY^^i are rj quadrature points and weights, respectively, IDkl is the ap¬ 

proximated K-L divergence given by (Elj), and Sn is the interpolation error. Different types 
of quadrature rules are available, including Gauss and Clenshaw-Curtis rules. In Gauss 
quadrature, quadrature points and weights correspond to the multivariate p-orthogonal 
polynomials. For instance, Legendre and Hermite polynomials are used for uniform and 
Gaussian priors, respectively. In Glenshaw-Gurtis quadrature, Ghebyshev polynomials are 
employed to obtain quadrature points and weights. 

In a standard quadrature rule on full tensor product grids, the total number of quadrature 
points, r], grows exponentially with Ng. Full tensor product approximations can therefore 
be used only when the number of parameters is small (in practice when Ng < 3). In 
order to suppress the curse of dimensionality, sparse quadrature rules are employed. The 
main strategy used in sparse quadrature is to leave out the hne levels of a hierarchical 
interpolation. A full tensor product rule is recovered if all the hierarchical bases are used. 
Sparse approximations are both accurate and efficient, particularly when the integrand, 
Dkl{G*), is highly regular with respect to 6*. In general, the following estimates hold for 
full tensor quadrature rules: 

Dkl e => 8, = 

and for sparse quadrature rules: 

Dkl e => s, = 


Here, H‘^ is the space of multi-variate functions with square integrable s > 0 weak derivatives 
with respect to each variable, and is the space of multi-variate functions with square 
integrable s > 0 mixed weak derivatives. See 18, 15, 1^ for more details. 

On the other hand, if we use Monte Garlo random sampling, the expected information 
gain can be written as 


I = 


1 ^ 

DKL(e')p(e') DKL{e-) + eu . 


(69) 


where 6* is the random sample drawn from distribution p{0*), and Em = Op 

We note that recent advances in Monte Garlo type methods, such as multilevel Monte 
Garlo and multi-index Monte Garlo j^, can be used to accelerate the computation of 
expected information gain, when the dimension is high and/or the integrand function lacks 
high regularity with respect to the parameters. 
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Remark 3. In addition to the quadrature error, Sr, or em, we also need to consider the 
discretization error in the computation of Hessian, which is proportional to where q 
depends on the order of accuracy of the finite difference scheme and the regularity of the 
wave solution. In practice, we take the spatial grid-length h small enough so that the 
discretization error does not dominate the quadrature error. 


4. Numerical Examples 

In this section, we present a few numerical examples to demonstrate the efficiency and 
applicability of the numerical method for the fast estimation of the expected information 
gain described above. 


4 . 1 . Model problem 

We consider a layered isotropic elastic material in a two-dimensional space and model a 
simplified earthquake, which is similar to the layer over half space problem called LOH.l j^. 
The top layer, Dj, extends over —1000 < X2 < 0, and the half space, Djf, is given by X 2 < 
— 1000. We truncate the domain and consider the box D = [—10000,10000] x [—15000,0]. 
We impose the stress-free boundary condition on the free surface, X 2 = 0, and the first-order 
Clayton-Engquist non-reflecting boundary conditions at the artificial boundaries. The 
material density and velocities are given by 


f 2600 

e Dj 

, . f 4000 

X G 

, , f 2000 X G T>/ 

\ 2700 

X G Djj 

Cp(x) - 1 gQQQ 

X G Djj 

'=*<’=> = 1 3464 X e Du 


A point moment tensor forcing is applied with a Gaussian time function. 


, tgy COg) 


-c^i (t-ts)V2 


which is parametrized by the frequency Ug and the center time tg. Except for the conver¬ 
gence study in Section 14.31 for all experiments, we consider uniform priors for all Ng = 7 
parameters: 


~W(-1000,1000), 02 ~W(-3000,-1000), 03 ~Z^(0.5,1.5), 


04 ~ W(3, 5), 05, 06, 07 ~ W(10^A 10^"). 

Here, the vector of parameters reads 6 = {xis,X2s,ts,Ug,mxixi,'m>xiX2T'^x2X2)~'^■ 

The array of receivers is placed on the ground surface. See Figure [1] The observation 
vector contains all displacements measured at the receivers. We assume that the measure¬ 
ment errors for the horizontal and vertical displacements at each receiver are independent 

Gaussian random variables, Sr ~ AA(0, CJ, with Cg = | ^ 0001 ) receivers. 


r = 1,..., Nji- We employ the second-order finite difference approximation proposed in [16 
with a fixed spatial grid-length, h = 200, and a time step. At = 0.025. The elastic wave 
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Figure 1: The two-layered spatial domain D = [—10000,10000] x [—15000,0] with stress-free and non¬ 
reflecting boundary conditions. An array of Nji receivers are located on the ground surface in equidistant 
recording points. 


equation is integrated to the time, T = 8. We note that h = 200 is the largest spatial grid- 
length for which the discretization error, which is proportional to 0 {h‘^), does not dominate 
the quadrature error in the integration of the information gain fl5^ with respect to the prior 
distribution. We record the wave solutions at W = 1 + T/At = 321 discrete time levels. 
For example. Figure [2] shows the ground motions at the receiver station, xq = (5000, 0), as a 
function of time, due to solving the elastic wave equation with the prior source parameters, 
E[0] = (0, -2000,1,4,10l^ 10l^ lO^^)^. 



t 



t 


Figure 2: The ground motions versus time at the receiver station, Xq = (5000,0). The motions are due to 
solving the elastic wave equation with the prior source parameters, E[0] = (0, —2000,1,4,10^^, 10^^, 10^^)^. 
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4-2. The scaled Hessian 

The sizes of the source parameters span many orders of magnitude. In SI units, we have 

01,02 = O(10^)m, 03 = 0(1) s, 04 = 0(10) 1/s, 05,06,07 = 0(10^=^)-0(10^®) Ns. 

Consequently, the condition number of the Hessian is very large. The Hessian there¬ 
fore is scaled, as described in Section 3.3.3. For more clarihcation, we compute the Hes¬ 
sian matrix. Hi, in fl5T)) for the expected value of the prior source parameters, E[0] = 
(0, —2000,1,4,10^^, 10^^, 10^^)^. The condition number of the unsealed Hessian computed 
by Matlab is cond (Hi) = 3.88 x 10^°. Now, if we use the scaling matrix, S G with 
diagonal elements, Su = and zero off-diagonal elements, then the condition number 

of the scaled Hessian, Hi = HiS~^, reduces signihcantly to cond{Hi) = 12.16. 


4-3. Convergence of sparse quadrature 

In this section, we numerically study the convergence of the two quadrature techniques 
based on sparse grids and Monte Carlo samples. We consider three parameters: 

02 ~W(-30 00,-1000), 04 ~W(3,5), 05 ~ W(10^^10^^) 


and leave the other four parameters hxed, i.e. 0i = — 1000,03 = 1, and 06 = 07 = 10^“^. 
We use a Gauss-Legendre sparse grid based on a total degree multi-index set. See [l^ for 
details on the construction of sparse grids. We consider a sequence of twenty sparse grids 
with rji = 351, 681,..., 271857 quadrature points. The sparse grids correspond to three 
directions and are obtained by total degree index sets. We then compute the relative error. 






41 


14 + 11 


7 = 1,...,19. 


We also consider a sequence of Mj = 10^, 10^, lO'^, 5 x 10^, 10® random samples and carry on 
ten realizations of each Mi. In a similar way as above, we compute the relative error SMi in 
the Monte Carlo sampling technique. Figure |3] shows the relative errors in sparse quadrature 
and in Monte Carlo sampling technique Em, versus the number of quadrature points p 
and the number of samples M. A simple linear regression through the data points shows 
that the rate of convergence of sparse qnadrature is 0.40, while the rate of convergence of 
Monte Carlo is 0.49. The slow rate of convergence in sparse quadrature is a result of low 
regularity of I with respect to 0. Specihcally, the solution of our problem does not have high 
regularity with respect to the parameters of source location, due to the singular source term. 
However, the regularity needed to satisfy Assumptions 1 and 2 can still be provided by the 
twice continuously differentiable discretization of the delta function in the source term. 


4.4- Comparison of Laplace method and nested Monte Carlo Sampling 

We numerically verify the concentration of measure by comparing the results of Laplace 
method (sparse quadrature) and direct nested Monte Carlo sampling. We assume three 
parameters are known with values identical to those in the previous subsection. We collect 
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Figure 3: The relative errors in sparse quadrature and Monte Carlo versus the number of quadrature points 
and samples. The rate of convergence, obtained by linear regression through the data points, is 0.40 for 
sparse quadrature and 0.49 for Monte Carlo sampling. 


data for a period of T = 1.25 at two receivers located at xi = —9000 and xi = 1000. 
The values of expected information gains computed by both methods with respect to the 
number of samples/quadratures are shown in Figure 01 Note that the Laplace method 
converges much faster than the nested Monte Carlo and the difference between the hnal 
results is less than 4%. The strong bias in the nested Monte Carlo is due to the fact that 
we reused the samples in the inner and outer loops. 



Figure 4: Comparison of the convergence performances of Laplace method and nested Monte Carlo sampling. 


4-.5. Experimental setups 

We carry out three sets of experiments for the model problem in Section 14.11 
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• Scenario I: The number of receivers and the distance between the receivers vary, but 
the interval on which the receivers are distributed evenly and symmetrically around 
xi = 0 is hxed, [-8000,8000]. In particular, we consider the following settings: 


Nr 

3 

5 

9 

17 

41 

81 

dR 

8000 

4000 

2000 

1000 

400 

200 


giving a total of six experiments, dn is the distance between two consecutive receivers. 

• Scenario II: The number of receivers varies, = 1, 3, 5,..., 19, and the distance 
between the receivers is hxed, dn = 1000. We distribute the receivers evenly and 
symmetrically around xi = 0. This gives a total of 10 experiments. 

• Scenario III: The number of receivers is hxed. Nr = 5, and the distance between the 
receivers varies, dR = 200,400, 600,..., 4000. We distribute the receivers evenly and 
symmetrically around xi = 0. This gives a total of 20 experiments. 

Figure |5] shows the expected information gain, computed both by Monte Carlo sampling 
with M = 10"^ samples and by sparse quadrature with 7 ] = 8583 quadrature points, for six 
experiments in scenario I. The expected information gain increases sharply until the number 
of seismograms reaches 20. The extra gains of information is marginal when the number of 
seismograms is more than 20, 



Figure 5: The expected information gain, computed both by Monte Carlo sampling with M = 10^ samples 
(together with 68.27% confidence interval) and by sparse quadrature with rj = 8583 quadrature points, for 
six experiments in scenario I. The confidence intervals are indeed very small, less than 1%. 

Figure |6] shows the expected information gain, computed both by Monte Carlo sampling 
with M = 10^ samples and by sparse quadrature with r] = 8583 quadrature points, for 10 
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Figure 6: The expected information gain, computed both by Monte Carlo sampling with M = 10^ samples 
(together with 68.27% confidence interval) and by sparse quadrature with rj = 8583 quadrature points, for 
10 experiments in scenario 11. 


experiments in scenario II. It shows that as we increase the number of seismograms, the 
information gain increases marginally. 

Figure [7] shows the expected information gain, computed both by Monte Carlo sampling 
with M = 10“^ and M = 10^ samples and by sparse quadrature with rj = 8583 and rj = 26769 
quadrature points, for 20 experiments in scenario III. It shows that the experiment with 
dfi = 1000 gives the maximum information. Both lumping and sparsifying the seismograms 
give suboptimal designs. 



Figure 7: The expected information gain, computed both by Monte Carlo sampling (together with 99.7% 
confidence interval) and by sparse quadrature, for 20 experiments in scenario III. 
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Figure [8] shows seven quantities of interest, with i = 1,... ,7, which represent the 
information gains of each parameter separately, computed by Monte Carlo sampling with 
M = 10^ samples for 20 experiments in scenario III. The experiment with approximately 
= 1000 gives the maximum information for 02, ^6 and O^. The experiment with approxi¬ 
mately dfi = 500 gives the maximum information for 9^. The experiment with approximately 
= 2000 gives the maximum information for 9^. However, sparsifying the seismograms 
does not induce a drop of information gain in 9i and 03 . 

Note that we simply sweep over the design spaces to search for the optimal designs 
because all the scenarios we considered are one-dimensional with respect to the experimental 
setup, In the cases that more freedom is allowed in a higher dimensional design space, 
more advanced optimisation algorithm should be implemented. 
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0 500 1000 1500 2000 2500 3000 3500 4000 

Figure 8: The expected information gain corresponding to each parameter separately, computed by Monte 
Carlo sampling with M = 10® samples for 20 experiments in scenario III. 


5. Conclusion 

We have developed a fast method of optimal experimental design for statistical seismic 
source inversion in a Bayesian setting. This method can be generally applied to the design 
of non-repeatable experiments with a time-dependent model, as long as the assumptions 
in Section 3 are fulhlled. Taking into account that the Hessian of the cost functional is 
proportional to the product of the number of points in the time series of the measurements 
and the number of receivers, we use Laplace approximation to derive an analytical form of 
the information gain, which is a function of the determinant of the aforementioned Hessian 
matrix. The expected information gain eventually reduces to a marginalization of the in¬ 
formation gain over all possible values of the unknown source parameters. The asymptotic 
error terms have been derived. We have applied the new technique to the optimal design of 
the number and location of seismic receivers on the ground for a simplihed two-dimensional 
earthquake. 
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Appendix A. 

In this appendix, we show that 

Nr Nt-l 

^e9r{tm,0*) = Op{N^/^). 

r=l m=0 

For simplicity and to avoid tensor notations, we consider only the one-dimensional case, i.e., 
d = 1. The case when d> 2 follows in a similar way and with no difficulty. 

We hrst note that when d = 1, 0) = uidmi Xr] 0), = U, and are scalar 

quantities. Now, let u{tm,x;0) be discretized on a spatial grid with grid points. We 
collect u on all grid points in a vector denoted by u{tm, 0)- By the chain rule, we have 

Veu{tm, Xr] 0*) = VuU{t^, Xr] 0*) V0u{t^, 0*). 

Moreover, since Xr] 0) = S{x — Xr), then 

Vuu{tm, Xr] 0) = [0,..., 0,1, 0,..., 0] =: R, 

where R denotes a 1 x vector whose elements are zero, except a 1 at the position of the 
r-th recorder in the grid, denoted by j{r). Therefore, 

Nr Nt — 1 Nr Nt — 1 

C:'^ ^egr{tm,0*) Veu{tm,01- 

r=l m=0 r=l ^ m=0 

Note that Veu is a x Nq matrix, and Ce)\ is a 1 x A/^ vector, whose ele¬ 
ments are zero, except at Np positions corresponding to the Np recording points. 

Then, the right-hand side in the above formula is a 1 x A^ vector, whose f-th element 
reads Ylir=i{^r/c^) J2m=o d 0 ^Uj(^r){tm, 0*), with i = 1, ..., Ng. The desired estimate follows, 
noting that J2m=o \^ 0 i^j(r)\ is bounded from below and above away from zero, thanks to 
assumptions A1 and A2. 

Appendix B. 

In this appendix, we show that 

Nr Nt-l 

E ^9^egritm,0*) O C:^ er = Op{N^/^). 

r=l m=0 
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As in appendix A, we consider only the case when d = 1. We have 

VeVec/r = Ve (ir = Ve'^eu o ij, 

where VeV qu \s a. Nq x Ng x tensor. Note that for two real vectors, a, b G MP, we use 
the notation VeVea o b = Ve Vea*. Therefore, we have 

Nr Nt — 1 Nt — 1 Nr 

Y1 ^e^egr{tm,0*)oC:^er= 

r=l 771=0 m=0 r=l ^ 

Similar to Appendix A, we can show that the right hand side in the above formula is a 
NgxNg matrix whose element (i, k) reads X!^i(^)-/ce) '^m=o 0*)- The desired 

estimate follows, noting that J2m=o ^ 9 i 9 k^ 3 ir) bounded from below and above away from 
zero, thanks to assumptions Al and A2. 

Appendix C. 

In this appendix, we show that 

Nr Nt-1 

E ^9gritm,0*V C:^Vgg,{tm,0n = 0{N). 

r=l m=0 

Again, we consider only the case when d = 1. Since Vgg^ = Veh, we have 

R Nt-l Nt-l Nr 

^ogritm, 0*V C:^ VggAtm, 0*) = W' Veu{t^, 0*V Y1 ^eu{tm, 0*). 

r=l m=0 m=0 r=l 


We note that ^ diagonal Nh x matrix whose diagonal elements are zeros, 

except elements {(j(r), j(r))}^]^, which are I’s. Therefore, the right-hand side in the above 
formula is a Ag x Ng matrix whose element (^,/c) reads J2m=o ^0i^i{'r)^9k^j(r)- 

The desired estimate follows, noting that J2m=o l^Si^jir) 99k^j(r)\ is bounded from below 
and above away from zero, thanks to assumptions Al and A2. 
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